overparameterized linear regression
Towards Data-Algorithm Dependent Generalization: a Case Study on Overparameterized Linear Regression
One of the major open problems in machine learning is to characterize generalization in the overparameterized regime, where most traditional generalization bounds become inconsistent even for overparameterized linear regression. In many scenarios, this failure can be attributed to obscuring the crucial interplay between the training algorithm and the underlying data distribution. This paper demonstrate that the generalization behavior of overparameterized model should be analyzed in a both data-relevant and algorithm-relevant manner. To make a formal characterization, We introduce a notion called data-algorithm compatibility, which considers the generalization behavior of the entire data-dependent training trajectory, instead of traditional last-iterate analysis.
On the Optimal Weighted \ell_2 Regularization in Overparameterized Linear Regression
Our general setup leads to a number of interesting findings. We outline precise conditions that decide the sign of the optimal setting $\lambda_{\opt}$ for the ridge parameter $\lambda$ and confirm the implicit $\ell_2$ regularization effect of overparameterization, which theoretically justifies the surprising empirical observation that $\lambda_{\opt}$ can be \textit{negative} in the overparameterized regime. We also characterize the double descent phenomenon for principal component regression (PCR) when $\vX$ and $\vbeta_{\star}$ are both anisotropic. Finally, we determine the optimal weighting matrix $\vSigma_w$ for both the ridgeless ($\lambda\to 0$) and optimally regularized ($\lambda = \lambda_{\opt}$) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- (14 more...)
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- (14 more...)
Review for NeurIPS paper: On the Optimal Weighted \ell_2 Regularization in Overparameterized Linear Regression
Weaknesses: The main issue I have with the paper is about the novelty of the results. The authors mention that previous work on linear regression is not as general as current work. In particular, they either only allow isotropic features or signal. This paper which is arXived about a month before the NeurIPS deadline seems to do both: [1] Emami, Melikasadat, et al. "Generalization error of generalized linear models in high dimensions." The results of this paper allow to characterize the exact generalization error in the same asymptotic limit for Guassian data with general covariance and any regularization, which includes the \ell_2 type regularzations considered here, as well as more general regularizations like general \ell_p norms. Here are my understanding of the differences of the results of the two papers: - In [1] the authors allow for a Gaussian feature with any covariance matrix, whereas your paper allow non-Gaussina features so long as they have bounded 12th centered-moment.
Towards Data-Algorithm Dependent Generalization: a Case Study on Overparameterized Linear Regression
One of the major open problems in machine learning is to characterize generalization in the overparameterized regime, where most traditional generalization bounds become inconsistent even for overparameterized linear regression. In many scenarios, this failure can be attributed to obscuring the crucial interplay between the training algorithm and the underlying data distribution. This paper demonstrate that the generalization behavior of overparameterized model should be analyzed in a both data-relevant and algorithm-relevant manner. To make a formal characterization, We introduce a notion called data-algorithm compatibility, which considers the generalization behavior of the entire data-dependent training trajectory, instead of traditional last-iterate analysis. Specifically, we perform a data-dependent trajectory analysis and derive a sufficient condition for compatibility in such a setting.
On the Optimal Weighted \ell_2 Regularization in Overparameterized Linear Regression
Our general setup leads to a number of interesting findings. We outline precise conditions that decide the sign of the optimal setting \lambda_{\opt} for the ridge parameter \lambda and confirm the implicit \ell_2 regularization effect of overparameterization, which theoretically justifies the surprising empirical observation that \lambda_{\opt} can be \textit{negative} in the overparameterized regime. We also characterize the double descent phenomenon for principal component regression (PCR) when \vX and \vbeta_{\star} are both anisotropic. Finally, we determine the optimal weighting matrix \vSigma_w for both the ridgeless ( \lambda\to 0) and optimally regularized ( \lambda \lambda_{\opt}) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.